6 research outputs found

    GĂ©nĂ©ralisation de l’analyse de performance dĂ©crĂ©mentale vers l’analyse diffĂ©rentielle

    Get PDF
    A crucial step in the process of application performance analysis is the accurate detection of program bottlenecks. A bottleneck is any event which contributes to extend the execution time. Determining their cause is important for application developpers as it enable them to detect code design and generation flaws.Bottleneck detection is becoming a difficult art. Techniques such as event counts,which succeeded to find bottlenecks easily in the past, became less efficient because of the increasing complexity of modern micro-processors, and because of the introduction of parallelism at several levels. Consequently, a real need for new analysis approaches is present in order to face these challenges.Our work focuses on performance analysis and bottleneck detection of computeintensive loops in scientific applications. We work on Decan, a performance analysis and bottleneck detection tool, which offers an interesting and promising approach called Decremental Analysis. The tool, which operates at binary level, is based on the idea of performing controlled modifications on the instructions of a loop, and comparing the new version (called variant) to the original one. The goal is to assess the cost of specific events, and thus the existence or not of bottlenecks.Our first contribution, consists of extending Decan with new variants that we designed, tested and validated. Based on these variants, we developed analysis methods which we used to characterize hot loops and find their bottlenecks. Welater, integrated the tool into a performance analysis methodology (Pamda) which coordinates several analysis tools in order to achieve a more efficient application performance analysis.Second, we introduce several improvements on the Decan tool. Techniquesdeveloped to preserve the control flow of the modified programs, allowed to use thetool on real applications instead of extracted kernels. Support for parallel programs(thread and process based) was also added. Finally, our tool primarily relying on execution time as the main concern for its analysis process, we study the opportunity of also using other hardware generated events, through a study of their stability, precision and overheadUne des Ă©tapes les plus cruciales dans le processus d’analyse des performances d’une application est la dĂ©tection des goulets d’étranglement. Un goulet Ă©tant tout Ă©vĂšnement qui contribue Ă  l’allongement temps d’exĂ©cution, la dĂ©tection de ses causes est importante pour les dĂ©veloppeurs d’applications afin de comprendre les dĂ©fauts de conception et de gĂ©nĂ©ration de code. Cependant, la dĂ©tection de goulets devient un art difficile. Dans le passĂ©, des techniques qui reposaient sur le comptage du nombre d’évĂšnements, arrivaient facilement Ă  trouver les goulets. Maintenant, la complexitĂ© accrue des micro-architectures modernes et l’introduction de plusieurs niveaux de parallĂ©lisme ont rendu ces techniques beaucoup moins efficaces. Par consĂ©quent, il y a un rĂ©el besoin de rĂ©flexion sur de nouvelles approches.Notre travail porte sur le dĂ©veloppement d’outils d’évaluation de performance des boucles de calculs issues d’applications scientifiques. Nous travaillons sur Decan, un outil d’analyse de performance qui prĂ©sente une approche intĂ©ressante et prometteuse appelĂ©e l’Analyse DĂ©crĂ©mentale. Decan repose sur l’idĂ©e d’effectuer des changements contrĂŽlĂ©s sur les boucles du programme et de comparer la version obtenue (appelĂ©e variante) avec la version originale, permettant ainsi de dĂ©tecter la prĂ©sence ou pas de goulets d’étranglement.Tout d’abord, nous avons enrichi Decan avec de nouvelles variantes, que nous avons conçues, testĂ©es et validĂ©es. Ces variantes sont, par la suite, intĂ©grĂ©es dans une analyse de performance poussĂ©e appelĂ©e l’Analyse DiffĂ©rentielle. Nous avons intĂ©grĂ© l’outil et l’analyse dans une mĂ©thodologie d’analyse de performance plus globale appelĂ©e Pamda.Nous dĂ©crirons aussi les diffĂ©rents apports Ă  l’outil Decan. Sont particuliĂšrement dĂ©taillĂ©es les techniques de prĂ©servation des structures de contrĂŽle du programme,ainsi que l’ajout du support pour les programmes parallĂšles.Finalement, nous effectuons une Ă©tude statistique qui permet de vĂ©rifier la possibilitĂ© d’utiliser des compteurs d’évĂšnements, autres que le temps d’exĂ©cution, comme mĂ©triques de comparaison entre les variantes Deca

    Domain knowledge specification for energy tuning

    Get PDF
    To overcome the challenges of energy consumption of HPC systems, the European Union Horizon 2020 READEX (Runtime Exploitation of Application Dynamism for Energy-efficient Exascale computing) project uses an online auto-tuning approach to improve energy efficiency of HPC applications. The READEX methodology pre-computes optimal system configurations at design-time, such as the CPU frequency, for instances of program regions and switches at runtime to the configuration given in the tuning model when the region is executed. READEX goes beyond previous approaches by exploiting dynamic changes of a region's characteristics by leveraging region and characteristic specific system configurations. While the tool suite supports an automatic approach, specifying domain knowledge such as the structure and characteristics of the application and application tuning parameters can significantly help to create a more refined tuning model. This paper presents the means available for an application expert to provide domain knowledge and presents tuning results for some benchmarks.Web of Science316art. no. E465

    Domain Knowledge Specification for Energy Tuning

    Get PDF
    The European Horizon 2020 project READEX is developing a tool suite for dynamic energy tuning of HPC applications. While the tool suite supports an automatic approach, domain knowledge can significantly help in the analysis and the runtime tuning phase. This paper presents the means available in READEX for the application expert to provide his expert knowledge to the tool suite

    Generalization of the decremental performance analysis to differential analysis

    No full text
    Une des Ă©tapes les plus cruciales dans le processus d’analyse des performances d’une application est la dĂ©tection des goulets d’étranglement. Un goulet Ă©tant tout Ă©vĂšnement qui contribue Ă  l’allongement temps d’exĂ©cution, la dĂ©tection de ses causes est importante pour les dĂ©veloppeurs d’applications afin de comprendre les dĂ©fauts de conception et de gĂ©nĂ©ration de code. Cependant, la dĂ©tection de goulets devient un art difficile. Dans le passĂ©, des techniques qui reposaient sur le comptage du nombre d’évĂšnements, arrivaient facilement Ă  trouver les goulets. Maintenant, la complexitĂ© accrue des micro-architectures modernes et l’introduction de plusieurs niveaux de parallĂ©lisme ont rendu ces techniques beaucoup moins efficaces. Par consĂ©quent, il y a un rĂ©el besoin de rĂ©flexion sur de nouvelles approches.Notre travail porte sur le dĂ©veloppement d’outils d’évaluation de performance des boucles de calculs issues d’applications scientifiques. Nous travaillons sur Decan, un outil d’analyse de performance qui prĂ©sente une approche intĂ©ressante et prometteuse appelĂ©e l’Analyse DĂ©crĂ©mentale. Decan repose sur l’idĂ©e d’effectuer des changements contrĂŽlĂ©s sur les boucles du programme et de comparer la version obtenue (appelĂ©e variante) avec la version originale, permettant ainsi de dĂ©tecter la prĂ©sence ou pas de goulets d’étranglement.Tout d’abord, nous avons enrichi Decan avec de nouvelles variantes, que nous avons conçues, testĂ©es et validĂ©es. Ces variantes sont, par la suite, intĂ©grĂ©es dans une analyse de performance poussĂ©e appelĂ©e l’Analyse DiffĂ©rentielle. Nous avons intĂ©grĂ© l’outil et l’analyse dans une mĂ©thodologie d’analyse de performance plus globale appelĂ©e Pamda.Nous dĂ©crirons aussi les diffĂ©rents apports Ă  l’outil Decan. Sont particuliĂšrement dĂ©taillĂ©es les techniques de prĂ©servation des structures de contrĂŽle du programme,ainsi que l’ajout du support pour les programmes parallĂšles.Finalement, nous effectuons une Ă©tude statistique qui permet de vĂ©rifier la possibilitĂ© d’utiliser des compteurs d’évĂšnements, autres que le temps d’exĂ©cution, comme mĂ©triques de comparaison entre les variantes DecanA crucial step in the process of application performance analysis is the accurate detection of program bottlenecks. A bottleneck is any event which contributes to extend the execution time. Determining their cause is important for application developpers as it enable them to detect code design and generation flaws.Bottleneck detection is becoming a difficult art. Techniques such as event counts,which succeeded to find bottlenecks easily in the past, became less efficient because of the increasing complexity of modern micro-processors, and because of the introduction of parallelism at several levels. Consequently, a real need for new analysis approaches is present in order to face these challenges.Our work focuses on performance analysis and bottleneck detection of computeintensive loops in scientific applications. We work on Decan, a performance analysis and bottleneck detection tool, which offers an interesting and promising approach called Decremental Analysis. The tool, which operates at binary level, is based on the idea of performing controlled modifications on the instructions of a loop, and comparing the new version (called variant) to the original one. The goal is to assess the cost of specific events, and thus the existence or not of bottlenecks.Our first contribution, consists of extending Decan with new variants that we designed, tested and validated. Based on these variants, we developed analysis methods which we used to characterize hot loops and find their bottlenecks. Welater, integrated the tool into a performance analysis methodology (Pamda) which coordinates several analysis tools in order to achieve a more efficient application performance analysis.Second, we introduce several improvements on the Decan tool. Techniquesdeveloped to preserve the control flow of the modified programs, allowed to use thetool on real applications instead of extracted kernels. Support for parallel programs(thread and process based) was also added. Finally, our tool primarily relying on execution time as the main concern for its analysis process, we study the opportunity of also using other hardware generated events, through a study of their stability, precision and overhea

    Quantifying performance bottleneck cost through differential analysis

    No full text
    International audienc

    The READEX formalism for automatic tuning for energy efficiency

    Get PDF
    Energy efficiency is an important aspect of future exascale systems, mainly due to rising energy cost. Although High performance computing (HPC) applications are compute centric, they still exhibit varying computational characteristics in different regions of the program, such as compute-, memory-, and I/O-bound code regions. Some of today’s clusters already offer mechanisms to adjust the system to the resource requirements of an application, e.g., by controlling the CPU frequency. However, manually tuning for improved energy efficiency is a tedious and painstaking task that is often neglected by application developers. The European Union’s Horizon 2020 project READEX (Runtime Exploitation of Application Dynamism for Energy-efficient eXascale computing) aims at developing a tools-aided approach for improved energy efficiency of current and future HPC applications. To reach this goal, the READEX project combines technologies from two ends of the compute spectrum, embedded systems and HPC, constituting a split design-time/runtime methodology. From the HPC domain, the Periscope Tuning Framework (PTF) is extended to perform dynamic auto-tuning of fine-grained application regions using the systems scenario methodology, which was originally developed for improving the energy efficiency in embedded systems. This paper introduces the concepts of the READEX project, its envisioned implementation, and preliminary results that demonstrate the feasibility of this approach
    corecore